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Computing  Optimal  Sequential  Allocation  Rules  In  Clinical  Trials* 


by 


Michael  M.  Katehakis  and  Cyrus  Herman 
SUNY  at  Stony  Brook  Columbia  University 


Abatract 

The  problem  of  assigning  one  of  several  treatments  in  clinical 
trials  ia  formulated  as  a  discounted  bandit  problem  that  was  studied 
by  Gittins  and  Jones.  The  problem  involves  comparison  of  certain 
state  dependent  indices.  A  recent  characterization  of  the  index  is 
used  to  calculate  more  efficiently  the  values  of  these  indices. 

1.  Introduction:  We  consider  the  well  known  problem  of  optimal  allocation  of 
treatments  in  clinical  trials.  A  simple  version  of  the  problem  is  as  follows. 
There  are  several  possible  treatments  for  a  given  disease.  When  a  particular 
treatment  n  ia  used  it  is  either  effective  with  unknown  probability  0n  or 
not  effective  with  probability  1  -  0n  .  The  problem  is  to  find  a  sequential 
sampling  procedure  which  maximizes  a  measure  of  the  expected  total  number  of 
treatment  successes.  When  the  planning  horizon  ia  infinite,  prior  distri¬ 
butions  are  assigned  to  the  unknown  parameters,  and  one  takes  the  expected 
total  discounted  number  of  successes  as  the  relevant  measure  of  peformance  of 
a  sequential  sampling  procedure,  the  problem  can  be  put  into  the  form  of  a 
discounted  version  of  the  bandit  problem  treated  successfully  by  Gittins  and 
Jones  (1974).  The  original  formulation  of  the  multi  armed  bandit  problem  and 
the  sequential  clinical  trials  problem  is  due  to  Robbins  (1952).  Gittins  and 
Jones  showed  that  there  is  an  index  associated  with  each  state  of  each  bandit 

*Work  supported  by  USAF  Contract  AFOSR  840136,  NSF  Grants  DMS-84-05413, 
BCS-85-07671  and  ONR  Contract  N00014-84-K-0244. 
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such  that  an  optimal  procedure  always  uses  the  bandit  with  the  largest  current 
index  value.  Recently,  Katehakis  and  Veinott  (1985)  have  obtained  a  new  chara¬ 
cterization  of  the  index  which  allows  the  index  to  be  sore  easily  calculated. 
The  purpose  of  this  paper  is  to  illustrate  the  calculation  of  the  index  in  the 
context  of  the  clinical  trials  problem  using  this  new  characterization. 


2.  CossHitins  Dynamic  Allocation  Indices:  Suppose  N  treatments  are  available 
for  treating  patients  with  a  certain  disease.  Let  Yn(k)  =  1  (  Yn(k)  =  0  ) 

denote  the  outcome  that  the  n*h  treatment  has  been  successful  (unsuccess¬ 
ful)  the  k^h  time  it  is  used.  At  times  t  =  1,2,...,  based  on  past 
observations,  one  has  to  decide  which  treatment  to  allocate  to  the  next  patient. 
At  the  start  of  the  experiment  we  assume  that  0n  is  a  random  variable  with 
Beta  prior  density  with  parameter  vector  (an,  bn);  i.e.,  0n  has  the  prior 
density 

(i)  g  <e)  =  ru  )r(b  ){r(a  +  b  )r1ean_1(i-e)bn~1  ,  vb»o, 

n  n  n  n  n 

where  in  (1)  a  ,  b  are  strictly  positive  constants.  Furthermore,  we  assume 
n  n 

that  0J,..., are  independent.  If  after  k  trials  using  treatment  n  we 

let  x  (k)  =  ( s  (k),  f  (k))  ,  where  s  (k)  (/( k))  denotes  the  number  of 

n  n  n  n  n 

successes  (the  number  of  failures)  then,  the  posterior  density  of  0n  given 

x  (k)  is  also  Beta  with  parameter  vector  (a  +  s  (k),  b  +  /  (k))  .  Thus, 
n  n  n  n  n 

the  information  obtained  during  the  first  k  trials  from  treatment  n  is 


summarized  by  xn(k)  .  Furthermore,  (xn(k),  k*l}  is  a  Markov  chain  on 
S  =  {(s.f)  ,  s,f  =  0,1,2,...}  with  transition  probabilities  given  by 
(2)  P(xn(k+1)  =  (s+l,f)  |  xn(k)  =  (a,f)) 

=  1  -  P(xn(k+1)  =  (s,f+l)  |  xn(k)  =  (s,f)) 


=  P(Yn(k+l)  =  1  |  x„(k)  =  (s,f)) 


an  1  * 

+  b.  +  s  +  f 
n  n 


'**  N  J 


r 


% 


The  problem  is  to  determine  a  policy  Tt  which  maximizes  the  expected 
discounted  number  of  successes;  i.e.,  to  maximize  w(n,a) 

(3)  w(n,a)  =  }...}  E(  rt®x  at_1  Yn(t))  g^de^ . . •*„(<»„>  . 

where  Yn(t)  is  Y„(k)  if  at  tine  t  treatment  it(t)  is  used  for  the  k**1 
tine  and  a  €  (0,  1)  is  a  discount  factor.  An  interpretation  of  the  discount 
factor  a  is  that  1  -  a  is  the  probability  that  at  any  given  tine  the 
entire  experinent  will  be  terminated.  Stated  otherwise,  there  are  N  Markov 
chains;  the  problem  is  to  sequentially  activate  one  of  them,  leaving  the 
others  inactive,  in  order  to  maximize  the  expected  total  discounted  reward. 

In  this  case  the  expected  reward  at  any  time  is  the  expected  posterior  proba¬ 
bility  of  success  associated  with  the  state  of  the  activated  Markov  chain; 
i.e.,  if  the  n^  chain  is  activated  for  the  k^h  time  when  xn(k)  =  (s,f)  , 
then,  the  corresponding  reward  is 

(4)  rn(s,f)  =  E(Y„(k+l)  |  xn(k)  =  (s,f ) ) 

a_  +  s 
-  n 

"a  +  b  +  s  +  f  ' 
n  n 

Within  the  context  of  this  formulation,  Gittins  and  Jones  (1974)  showed  that 
this  problem  can  be  reduced  to  N  one  dimensional  problems.  Each  of  the 
latter  problems  involves  a  single  Markov  chain  and  its  solution  is  the 
calculation  of  a  dynamic  allocation  index  mn(s,f)  associated  with  the  current 
state  (s,f)  of  the  Markov  chain.  Then,  at  each  point  of  time  an  optimal  po¬ 
licy  for  the  original  problem  is  such  that  it  activates  the  chain  with  the 
largest  current  index  value.  Based  on  an  earlier  characterization  of 
(1-a)  ^mnfs.f)  ,  Gittins  and  Jones(1979)  used  an  algorithm  for  computing  opti¬ 
mal  policies.  Recently,  Katehakis  and  Veinott  (1985)  have  obtained  a  different 
characterization  of  the  index.  This  characterization  casts  the  calculation  of 
the  index  into  the  form  of  a  familiar  replacement  problem,  e.g.,  see  Derm an 


r,r 


V  •**  V V ■  »  •*.'»  J»V  •  *  A  .V.  ‘.'a'  ■ 


cnjruro 


(1970,  pp.  121).  Namely,  if  C  is  the  class  of  policies  R  for  controlling 
(xn(k)  ,  k*l}  by  either  allowing  it  to  continue  or  to  instantaneously  restart 
it  at  its  initial  state  xD(l)  =  (s,f)  ,  then. 


(5)  »n(a,f)  =  sup  {E  (£®  <xk-1rn(xn(k) )  |  xn(l)  =  (s,f))}  . 

R  R  K—  1 

We  next  show  that  (5)  can  be  used  to  evaluate  mn(s,f)  with  sufficient 
accuracy.  In  the  sequel  we  will  be  concerned  with  a  single  treatment;  for 
notational  simplicity  we  will  drop  the  subscipt  n  .  Since  computing  m(s,f) 
is  essentially  the  same  as  computing  m(0,0)  -  it  only  involves  changing  the 

prior  vector  from  (a,b)  to  (a  +  a,  b  +  f)  -  it  suffices,  without  loss  of 
generality,  to  discuss  only  the  computation  of  m(0,0)  .  It  is  well  known  that 
solving  (5)  for  a  fixed  initial  state  (0,0)  involves  solving  the  dynamic 
programming  equations 

(6)  »(.,f)  =  »  (  g-S-g  *  «  (g-f-g  V(1,0)  ♦  g-£-g-  V(0,1)1  , 


_ a  + 

a  +  b 


TTT7  + « trvf.Vv#  * 


a.bll.  f  vu.m)]} 


V  (s,f)  €  S  . 

That  the  above  equation  (6)  is  for  computing  a(0,0)  is  reflected  in  the 
appearance  of  the  terms  V(1,0)  and  V(0,1)  in  the  right  side  of  (6).  Given 
the  solution  (V(s,f),  V  (s,f)  e  S}  of  (6)  then  m(0,0)  =  V(0,0)  . 

Equation  (6)  is  of  the  form  V(s,f)  =  TsfV  or  equivalently 


(7)  V  =  TV 

where  in  (7)  V  is  the  vector  of  values  (V(s,f)}  and  T  is  a  contraction 
operator  on  a  complete  metric  space.  Thus,  it  has  a  unique  bounded  solution. 
In  computing  the  solution  of  (7)  we  consider  the  finite  subset 
=  ((s.f)  €  S  :  s  +  f  *  1}  and  the  two  systems  of  equations 


s  +  f  *  L, 
s  +  f  =  L, 
s  +  f  <  L, 
s  +  f  =  L. 

We  will  use  the  following  more  compact  notation  for  (8)  and  (9) 

(8c>  °L  =  T1°l  • 

<9c)  Ut  =  I2Ut  . 

The  transformations  T  ,  ,  Tg  are  monotone  contractions,  thus,  succes¬ 

sive  approximations  will  converge  to  their  unique  fixed  points  for  any  initial 


points 

V<°>, 

u<°>.  U<°>  .  That  is. 

(10) 

lim 

v(n)  =  11. 

ly'n-l*  ,  ,  _ 

n-xio 

n-*co 

(11) 

lim 

n-w> 

31 

It 

T  -  „ 

T1  '  l  ' 

(12) 

lim 

UT(n)  =  lim 

T,U,(n“U  =  U.  . 

rr+oo 

L  ir*co 

2  L  L 

Moreover,  if 

the  points 

uj^ ,  0^^  are  chosen  propitiously,  the  con- 

vergence  in  (10),  is  from  below  or  above  as  desired  and  from  below  (above)  in 

(ID  ((12)). 


An  algorithm  to  compute  V(0,0)  based  on  (10)  involves  an  infinite  number 
of  variables;  however,  propositions  1  and  2,  below,  allow  us  to  use  (11)  and 

(12)  which  involve  only  a  finite  number  of  variables.  The  proof  of  proposition 
1  is  easy  and  it  is  omitted. 

Proposition  1:  For  equations  (7),  (8)  and  (9)  we  have 

(13)  ; ;  f (1 "  Dt,~1  *  *  (1 "  a)_1  for  a11  {s’f)  €  s  • 

and 


(8a) 

uL(s,f) 

ii 

> 

.  if 

(8b) 

Uj^S.f) 

a  +  s  1 

,  if 

'a+b  +  s  +  f  1-a 

(9a) 

UL(s,f) 

=  TsfUt 

,  if 

(9b) 

UL(s,f) 

1 

"1-a 

,  if 

5 


(14)  u^(s,f)  *  V(s,f)  *  U^(s,f)  ,  for  all  (s,f)  such  that  s  +  f  *  L  . 

Propoaition  2:  For  any  €  *  0  there  exist  an  Lq  =  L(c)  such  that 

(15)  UL(0,0)  -  uL(0,0)  *  €  ,  for  all  L  *  LQ  . 

Proof:  Because  of  (14)  it  suffices  to  show  that  for  any  positive  constants 
and  e2  there  exist  Lj  =  L(€^)  and  L2  =  L(c2)  such  that 

(16)  UL(0,0)  -  V(0,0)  *  Cj  ,  for  all  L  *  ^  , 


(17)  V(0,0)  -  uL(0,0)  *  €2  ,  for  all  L  *  I<2  . 

We  only  prove  (16)  since  the  proof  of  (17)  is  analogous. 

If  we  take  =  (1  -a)  ^  in  (10)  and  (12)  then,  for  any  L  and  all 

n  *  L  we  obtain  that 

(18)  u[n)(0,0)  =  V(n) (0,0)  , 

and  the  convergence  in  (10),  (12)  is  fro«  above;  thus,  using  (10)  and  the  fact 
that  V(s,f)  *  0  we  have 

(19)  V(n)(0,0)  -  V(0,0)  *  <xnsup{  V(0)(s,f)  -  V(s,f)  }  *  an(l  -a)-1  . 

(s.f) 

It  follows  from  (18),  (19)  that  for  any  L  and  for  all  n  *  L 

(20)  u[n)(0,0)  -  V(0,0)  «  an(l  -a)"1  . 

Similar  arguments  using  (12)  imply  that  for  all  n  *  1 

(21)  u[n)(0,0)  -  UL(0,0)  «  »"( 1  -a)'1  . 

Thus,  using  (20)  and  (21)  it  is  now  easy  to  complete  the  proof  of  (16). 

fe»rk:  It  was  assumed  that  each  clinical  trial  resulted  either  in  a  success 

or  in  a  failure.  The  methodology  described  here  extends  straightforwardly  to 

the  case  where  the  outcome  of  a  trial  can  be  classified  into  c  ,  c  *  2  , 

classifications.  Then  the  parameter  0  ,  is  a  vector  (0*,...,0C)  where  0i 

n  n  n  n 


-v 


.  v  v  s  v.  v.v. .v.v.v  -  -.*  ■ >  .v.v.  v  v_>_v  • 


•  .'-u- 


is  the  probability  of  the  trial  resulting  in  the  i*b  classification.  The 
Beta  prior  is  replaced  by  a  Dirichlet  prior  and  the  state  space  becomes 
S  =  {(Sp...,sc)  ,  s^  =  0,1,...  }  ,  where  s^  denotes  the  number  of  trials 
resulting  in  classification  i(l*i*c).  The  reward  is  a  given  function 
of  the  classification;  see  .also,  Glazebrook  (1978). 


3.  Computations:  For  a  given  (a,b)  in  order  to  compute  m(0,0)  -  V(0,0)  we 
use  transformations  and  T£  starting  from 


.(0) 


u^"(s,f) 


a  +  s 


1 


a  +  b  +  s  +  fl-ct 


and 


„(0) ,  _  1 

ul  (s-f)  -  • 


We  choose  t  sufficiently  large  according  to  proposition  1  and  iterate  until 


the  difference;  (j£n^(0,0)  -  u£n^(0,0)  is  small  .  We,  then,  take  as  our 


approximation  to  V(0,0)  the  mid  point  of  the  final  interval. 

Since  there  is  always  an  error  in  computing  the  indices,  the  possibility 
of  not  using  an  optimal  policy  always  exists.  In  our  context,  here,  this  can 
be  overcome  by  doing  enough  computations  to  guarantee  that  in  computing  the 
indices  the  bounding  intervals  do  not  overlap.  However  in  general,  Katehakis 
and  Veinott  (1985)  have  shown  that  if  the  computed  indices  are  close  to  the 
exact  indices  then  the  expected  discounted  return  of  the  policy  based  on  the 
computed  indices  will  be  close  to  the  optimal  expected  discounted  return. 

In  the  following  tables  the  results  of  some  calculations  are  tabulated. 
There  is  a  separate  table  for  each  value  of  a  =  .5,  .75,  .9  .  An  entry  in 
cell  (a+s,b+f)  is  the  index  for  a  treatment  having  prior  (a,b)  and  in 
state  (s,f)  . 

Note  that  the  numbers  in  table  2  (for  a+s,  b+f  =  1,2,... 5)  are  consistent 


with  those  published  by  Gittins  and  Jones  (1979). 


Table  1  (a  =  .5) 


.751 

.560 

1.071 

.859 

1.257 

1.051 

1.379 

1.187 

1.466 

1.288 

1.683 

1.558 

1.824 

1.747 

1.878 

1.822 

1.906 

1.863 

1.924 

1.888 

1.961 

1.942 

2 

3 

1.702 

1.272 

2.303 

1.856 

2.642 

2.221 

2.863 

2.476 

3.019 

2.663 

3.410 

3.164 

3.666 

3.516 

3.766 

3.657 

3.819 

3.734 

3.853 

3.783 

3.923 

3.886 
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.444  .  367 


.715  .611 


.902  .789 


1.040  .925 


1.147  1.032 


1.449  1.354 


1.675  1.609 


1.769  1.720 


1.821  1.781 


1.854  1.820 


1.923  1.905 


.039  .019 


.078  .039 


.115  .058 


.150  .077 


.184  .096 


.337  .183 


.575  .335 


.754  .463 


.892  .573 


1.670  1.433  1.254  1.115  1.003  .668 


.194  .099 


.351  .188 


.482  .269 


.592  .342 


.688  .410 


1.017  .677 


1.344  1.008 


,066  .049 


,128  .097 


,  186  . 142 


,  240  . 185 


,291  .226 


.507  .405 


,807  .672 


1.507  1.207  1.005  .862 


1.605  1.338  1.148  1.004 


1.819  1.668  1.540  1.430  1.335  1.001 


.007  .829 


.548  1.322 


.909  1.672 


.174  1.935 


.378  2.143 


.948  2.758 


.375  3.245 


.554  3.456 


.652  3.574 


.715  3.649 


.849  3.813 


.428  .212  .139  .104 


.754  .397  .267  .201 


1.018  .563  .386  .293 


1.240  .712  .497  .381 


1.429  .848  .600  .463 


2.076  1.383  1.034  .824 


2.715  2.039  1.631  1.358 


3.033  2.431  2.026  1.737 


3.224  2.691  2.308  2.020 


3.351  2.877  2.519  2.240 


3.643  3.342  3.087  2.867 


.083  .040 


.161  .080 


.236  .119 


. 308  . 157 


.377  .194 


.685  .370 


1.163  .676 


1.519  .933 


1.795  1.153 


2.016  1.343 


2.676  2.008 


1 

2 

3 

4 

5 

10 

7.028 

5.001 

3.796 

3.021 

1 

2.488 

1.269 

.608 

.391 

.287 

.226 

.110 

7.999 

6.346 

5.163 

4.342 

3.721 

2.117 

1.099 

.732 

.545 

.433 

.212 

8.541 

7.071 

6.001 

5.184 

4.562 

2.785 

1.526 

1.039 

.784 

.629 

.313 

8.721 

7.538 

6.578 

5.809 

5.179 

3.333 

1.906 

1.322 

1.008 

.813 

.411 

8.904 

7.868 

6.996 

6.276 

5.676 

3.800 

2.249 

1.585 

1.219 

.989 

.506 

9.341 

8.694 

8.103 

7.572 

7.  iOl 

5.373 

3.582 

2.674 

2.129 

1.767 

.951 

9.620 

9.243 

8.883 

8.543 

8.223 

6.905 

5.197 

4.160 

3.462 

2.964 

1.718 

9.729 

9.461 

9.201 

8.950 

8.710 

7.664 

6.157 

5.135 

4.403 

3.851 

2.363 

9.789 

9.580 

9.375 

9.177 

8.985 

8.121 

6.792 

I 

5.830 

5.102 

4.537 

2.912 

9.827 

9.655^ 

. 

9.486 

: 

9.322 

9.161 

8.426 

7.246 

1 

6.349 

i 

5.647 

l 

5.082 

3.387 

9.907 

9.816 

9.726 

9.637 

9.549 

9.128 

8.382 

7.745 

7.196 

6.719 

5.042 
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